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Abstract 

Nonlinear/non-Gaussian filtering has broad applications in many areas of life sciences 
where either the dynamic is nonlinear and/or the probability density function of un- 
certain state is non-Gaussian. In such problems, the accuracy of the estimated quan- 
tities depends highly upon how accurately their posterior pdf can be approximated. 
In low dimensional state spaces, methods based on Sequential Importance Sampling 
(SIS) can suitably approximate the posterior pdf. For higher dimensional problems, 
however, these techniques are usually inappropriate since the required number of par- 
ticles to achieve satisfactory estimates grows exponentially with the dimension of state 
space. On the other hand, ensemble Kalman filter (EnKF) and its variants are more 
suitable for large-scale problems due to transformation of particles in the Bayesian 
update step. It has been shown that the latter class of methods may lead to subop- 
timal solutions for strongly nonlinear problems due to the Gaussian assumption in 
the update step. In this paper, we introduce a new technique based on the Gaussian 
sum expansion which captures the non-Gaussian features more accurately while the 
required computational effort remains within reason for high dimensional problems. 
We demonstrate the performance of the method for non-Gaussian processes through 
several examples including the strongly nonlinear Lorenz models. Results show a re- 
markable improvement in the mean square error compared to EnKF, and a desirable 
convergence behavior as the number of particles increases. 

Keywords: Bayesian Estimation, Ensemble Data Assimilation, Gaussian Sum 
Expansion, Environmental Control 



Data assimilation is an essential tool for reliable prediction of natural and physical 
phenomena. It has broad application in many fields of science and engineering such 
as flood forecasting, weather prediction, contaminant tracking, wild fire tracking, oil 
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exploration, etc. [THl [231 1131 E]- It involves incorporating sparse observational 
data (usually corrupted by noise) into computer models to obtain reliable estimates 
of unknown state and parameters of a dynamical system. 

Many data assimilation techniques have been suggested to date that usually fall 
into one of the two main categories: variational and sequential. Variational methods 
deal with batch of data at specific time interval and aim to find the best possible 
estimate by minimizing a penalty function which usually includes the model, initial 
and boundary condition, and observational errors defined under certain statistical 
hypothesis [3]. Examples of this kind include 4D-var [30] and representer method 
[1]. On the other hand, in sequential data assimilation, which is also the approach 
adopted in this paper, the model variables are updated every time observations be- 
come available and may thus be computationally less expensive than the former [T2] . 
Examples of such techniques include Kalman filter and its variants [TTJ [UJ, [5]. 

In recent years, there has been a significant interest in developing sequential filter- 
ing strategies that could more accurately predict the state of a nonlinear process when 
the number of unknown variables is considerable, van Leeuwen [31] gives an excellent 
review of recently developed nonlinear filtering schemes. These methods are mostly 
based on a Monte Carlo approximation of probability distribution function (pdf)'s of 
concern. 

There exists Sequential Importance Sampling (SIS) based filters that represent a 
pdf as the weighted sum of delta functions centered at finite number of particles. 
In the simplest form, the weights are updated upon receiving observations while 
the particles remain intact [9]. Such approach usually requires the user of SIS to 
carefully choose a proposal density function from which the particles are drawn since 
it is desirable to have particles in areas with high likelihood. 

There are also ensemble Kalman filter (EnKF) based algorithms that approximate 
a pdf using i.i.d. samples (i.e., equal weights). Upon receiving observations, the 
weights remain equal while the samples (particles) are moved. EnKF was proposed 
by Evensen [TO] [6] and has been successfully applied to many large-scale nonlinear 
estimation problems so far [TTJ |15j [T3J El] . Much of the success of EnKF over SIS 
filter for large-scale problems is attributed to the fact that it nudges the particles to 
areas of higher likelihood [31] • Moreover, it is attractive in that it is conceptually 
simple, easy to implement and does not require computing the adjoint operator as 
needed in variational setting. Despite all its success in the nonlinear estimation arena, 
its tendency to generate unimodal pdf's may result in suboptimal results when the 
posterior pdf is far from Gaussian distribution [23l l3Tj . This is mainly attributed 
to the fact that in the Bayesian update step, EnKF approximates the prior density 
function with a Gaussian pdf [TTJ. 

As computers become more powerful, resolution and complexity of numerical mod- 
els tend to increase resulting in strongly nonlinear systems and hence there is a need 
to develop new strategies that could more accurately capture the non-Gaussian statis- 
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tics J3H E2] • It is this fact that led to the development of compound filters that try 
to combine the EnKF with SIS in order to obtain a more accurate representation of 
posterior pdf . Examples of such attempts can be found in (23J [32] . The idea behind 
these methods is to use the posterior pdf obtained from the EnKF as the proposal 
density for the SIS filter. Though these methods are attractive in that they approx- 
imate the posterior pdf more accurately than the EnKF does, in the opinion of the 
author, the two stage filtering introduces new tuning parameters (c.f. [23]) and im- 
poses extra computational cost as a result of explicit density kernel estimation which 
makes it hard to justify for large dimensions. The above mentioned difficulties are 
what motivated this study to reexamine the fundamentals of ensemble filtering and 
to possibly address some of these issues. 

In what follows we introduce a new methodology, the ensemble Gaussian sum filter 
(EnGSF) which tends to approximate the non-Gaussian statistics more accurately. 
This paper is organized as follows: In section [T] we introduce the general filtering 
problem. The theory and details of the ensemble Gaussian sum filter (EnGSF) is 
presented in section [2] Section [3] contains several numerical examples that shows the 
performance the methodology for nonlinear estimation problems. Finally, conclusion 
and future work are presented in section |4j 

1. Problem Statement 

Filtering in control and estimation theory refers to a process comprised of two 
stages, one of forecast and the other of data assimilation by which one can anticipate 
the dynamics of natural and physical systems. To further clarify, let us assume that 
the following Morkov model represents the dynamics of the system under study 

Xfc+i = /(x fc ) + e k (1) 

where /(•) : R m —> R m is the so-called model operator, x& £ R m is the state of 
the system at time k and e £ R m is the noise component representing the modeling 
uncertainty. Such equation is usually derived from discretization of a set of partial 
differential equations. 

Moreover, suppose that observational data are available at distinct instances in 
time and are related to the uncertain state variables through the following equation: 

y k = Hx fc + r k (2) 

Here, y k £ R n is the vector of observations at time k, H nxm is the measurement 
operator and r £ R n is the measurement noise vector. 

The uncertain state variable x*. can be fully described by a pdf. The assimilation 
step of filtering then involves exploiting the observational data to improve the knowl- 
edge on the pdf of the process while the forecast step refers to evolving the state 
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variables using model equation [TJ 



In terms of pdf's, the forecast step involves finding prior pdf at time k using 
information up to time k — 1, p(xfc|y 1 . fc _ 1 ), through Chapman-Kolmogorov equation 



The assimilation step involves obtaining the posterior pdf which contains information 
up to time k, p(x fc |y 1:fc ), using Baye's theorm 



The resulting pdf's in equations ([3]) and (j4|) can not in general be determined an- 
alytically unless for linear dynamics under certain assumptions [TTj HB] . For linear 
dynamics in which only Gaussian pdf's are involved, an optimal filter exists which 
is known as Kalman filter (KF) [T7] in the relevant literature. For nonlinear /non- 
Gaussian cases, however, there is practically no optimal filter. This has led to the 
development of numerous sub-optimal filters that seek an estimate either by lineariz- 
ing the nonlinear model (e.g. extended Kalman filter (EKF)) or approximating the 
pdf of the process by a collection of particles (e.g. EnKF, SIS filter) |3TJ [TTj, [2]. 

2. Ensemble Gaussian Sum Filter 

A non-Gaussian pdf can be approximated as the sum of finite number of Gaussian 
kernels known as Gaussian sum expansion [Tl 128]. To the best of author's knowledge, 
the early attempt to use such expansion for nonlinear filtering problems is the work of 
Sorenson in early 70's |28] . It is desirable since the sum of Gaussian kernels is always 
a valid density function regardless of the number of terms used in the expansion and 
converges uniformly to any desired density function |28j. 

In this section, we first present the fundamentals of Gaussian sum expansion and 
subsequently introduce an improved filtering methodology for large-scale nonlinear /non- 
Gaussian dynamics. We call the new method, the ensemble Gaussian sum filter (En- 
GSF) since it is built upon the ideas of ensemble data assimilation and principles of 
Gaussian sum filter. 

2.1. Gaussian sum expansion 

Since the Gaussian kernel is considered amongst the delta family of positive func- 
tions, any arbitrary pdf g can be approximated by as [28J, 



12] 



p(Xfc|yi:fc_l) = / p(Xfc|x fe _!) K X fc-l|yi:fc-l) d *k-l 



(3) 




(4) 
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gx converges uniformly to g as E — > 0. Therefore, a discrete approximation to g(x) 
may be obtained as follows: 



9* 



i=l 



CKi iV(x - u i; Ej 



(6) 



where 



ft; 



we refer to Oj's as the weights of Gaussian kernels throughout this paper. It is 
notable as the variance E decreases, the Gaussian kernels become closer to Dirac 
delta functions and the approximation in the limit of zero variance become identical 
to the one used in the SIS filter. 

2.2. Bayesian update 

In this section, the Baye's update equation, eq. Q, is approximated using the 
Gaussian sum expansion as the basis of our approximation. In the following deriva- 
tion, we assume that the measurement noise in eq. (J2j) is distributed according to 
N(0, R). We also drop the time subscript k and denote the prior pdf by p(x) to avoid 
complication in notation. Suppose that prior pdf is given by: 



p(x) = 5>{ iV(x-xf,E^ 



i=i 



One can then show that the posterior pdf is given by 



(7) 
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p(x|y) = 2>? tf(x-x?,E?) 



where 
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xf + SfH T 



hs/h t + r 



(y - Hxf ) 
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I - e/ h t 



hs/h t + r 



H E 



/ 



(11) 



Throughout this paper, we use superscripts "f" and "a" to represent forecast (prior 
to the receipt of data) and analysis (after assimilation of data) respectively. 
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According to the above formulas, a choice of (af , x[ , X[ ) uniquely determines 
(a",x",E"). Therefore, an appropriate choice of prior parameters is crucial for ob- 
taining an accurate estimate of the posterior pdf. 

The weights and particle supports, (af,xf), can be assigned upon initialization of 
the filter whereas the covariance matrices Ej's will be calculated at each assimilation 
step from the statistics of the ensemble of particles. Therefore, it is convenient to 
start filter with i.i.d samples of the prior. It is noted that sampling from the prior 
when the number of variables is large is not a trivial task. However, for Gaussian 
pdfs with special covariance structure, there are methods based on Fast Fourier 
Transform (FFT) [8j [M] or polynomial expansion of the square root of covariance 
matrix [7J by which one can generate initial realizations in a reasonable period of 
time. The fundamental assumption here is that the initial setup of the filter will be 
forgotten after forward integration of a dynamical system in a certain period of time. 

Moreover, it is both practical and convenient to choose the same covariance matrix 
for all particles. Such assumption significantly reduces the cost of data assimilation. 
In what follows, we give the theory for choosing an appropriate covariance matrix. 

2.2.1. A good choice o/E^ 

Choosing an appropriate covariance matrix for individual Gaussian kernels is a 
crucial step since it ultimately determines the direction and amount of movement 
for each individual particle. This is usually discussed under the rubric of bandwidth 
selection in density estimation literature [27] . 

£/ can be found by minimizing the Mean Integrated Squared Error (MISE) be- 
tween the true prior pdf, p(x), and the one estimated using Gaussian sum, (fe(x) 



where E{-} indicates the theoretical expectation. Here we eliminated superscript "/" 
for the sake of clarity. According to the above equations, the problem of determining 
the optimal choice of £ is a trade-off between bias in the approximation of a pdf and 
variance of the probability weights. In addition, the optimization procedure requires 
the complete knowledge of the prior probability which is usually not available. This 
makes the optimal choice to some extent subjective though it is possible to find 
approximate solutions for it. It is shown in that assuming a Gaussian pdf as the 
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underlying density at this stage is an appropriate assumption in many cases. Based 
on this assumption an approximate covariance matrix is obtained as [27] : 



£ = cN-^P (13) 

where N is the number of particles, m is the dimension of state vector, P is the 
theoretical prior covariance and c is a constanct defined by: 



2 

4 \ m+4 



m + 2 



(14) 



Note that c varies between 0.85 and 1.12 with an assymptotic value of 1.0 for large 
m. Therefore it can be safely replaced by 1.0, leading to: 

The above equations indicate that S depends upon two factors: (a) the number of 
particles and (b) the dimension of state space. Given a fixed number of particles, as 
the dimension of state space increases, larger covariance matrices needed for individual 



particles to efficiently approximate prior pdf. Note that equation (15) is sometimes 
referred to as the Silverman's rule of thumb and is only optimal for Gaussian pdfs. It 
tends to oversmoothen the non-Gaussian densities and hence it is usually beneficial 
to use a smaller covariance matrix in practice as suggested also in J27J . In this paper, 
we adopt a slightly modified equation for the covariance of individual samples which 
is given by: 

£/ = N~^P (16) 
where we have changed the power of N such that the covariance is always smaller 



than that obtained by equation (15). It is worth noting that the expression for 
given by equation (16) is consistent with the bandwidth suggested by Scott [26] based 
on the theory of histograms and tend to generate more satisfactory estimates in lower 
dimensions according to authors' experience. For higher dimensional problems, the 
effect of this change is minimal as £* — > P. 

At this point, inspired by the ensemble data assimilation strategies, we replace 
the true (theoretical) covariance P by the weighted ensemble (empirical) covariance 
arriving at the following equation for £* : 

£/ = N~^P e (17) 
where the weighted ensemble covariance matrix is defined by: 

P e = ^a i (x i -x)(x i -x) T (18) 
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where x = ^\ the weighted empirical mean. Eq. (1171) is the expression used 

in all the numerical examples performed in this paper. 

2.3. Forecast Step 

To predict the state of a system in the absence of data, one needs to calculate the 
Chapman-Kolmogrov integral equation, 



p( x fc|yi:fc-i) = J p(xfe|x fc _i) K x fc-i|yi : fc-i) rfx fe-i 
i 

= ^a i iV(x fc -/(xr),Q J ) 



i=i 



where Qj is the covariance of each sample after integration. In EKF type algoithms, 
this covariance is calculated using derivatives of the nonlinear function /(x) at the 
current estimate. This is the approach adopted in [28] for example. In this paper, 



however, we replace Qj with E^ given by equation (17) derived from the statistics of 
the forecast ensemble. Based on the the above formula, calculating the integral only 
involves forward integration of the model equation (jlj for individual particles. 

2.4- Degeneracy and Resampling 

Since the methodology discussed in this paper assumes that the weights associated 
with particles may be different, it is expected that over time these weights become 
inconsistently non-uniform and after certain time a few particles carry most of the 
probability mass. This phenomenon which is named degeneracy in filtering literature 
is a common attribute of almost all particle filtering techniques [31]. For the SIS 
particle filter, it is shown that the variance of the weights can only increase over 
time [5] and hence degeneracy is unavoidable. Degeneracy is undesirable since one 
makes extra effort updating particles that are of minimal weight in addition to the 
issues that may arise in approximating the covariance when needed. Moreover, for 
the times when there is no data (i.e., prediction steps) a large noise in the forward 
integration of the stochastic dynamical model for particles of high weight may take 
those to locations far from the true state resulting in an undesirable forecasts whereas 
this phenomenon is usually not present in case with equally probable particles. 

To reduce the effect of degeneracy, one usually resamples the posterior pdf when 
the number of particles with considerable weight decreases. Resampling generally 
involves ignoring particles with small weights and focusing on the particles that carry 
larger weights. The reader if referred to [H] for a detailed discussion on different 
resampling strategies. An appropriate measure of degeneracy is the effective number 
of particles, N e ff introduced in [T9] and is defined by, 

N 

N eff = ~ -— (19) 

;/ l + var(a*) y ' 



where a* is the so-called true weight. This expression cannot be evaluated exactly, 
however, it can be estimated using the following formula [HJ E]: 

Neff = (20) 

In SIS filter, one usually resamples if N e ff becomes less than a given treshold, 
Ntreshoid- In our approach, we prefer to resample every time data are assimilated 
since EnGSF relies on the covariance matrix obtained from the particles. A typical 
resampling algorithm is given below: 



Algorithm 1 : Resample 

[{*Lt> °iut}jLi} = R esam P le [{ x L,«L}£i] 

• Initialize CDF: Co = 

• FOR i = 1 : N 

- Construct CDF : q = q_i + a\ n 

• END FOR 

• Draw N random number {Wy^according to U[0, 1] 

• Start at the bottom of CDF: i — 

• FOR j = 1: N 

- WHILE a < u j 

* % = i + 1 

- END WHILE 

- Assign sample: = 

- Assign weight : a 3 out = N^ 1 

• END FOR 



Note that the weighting step of EnGSF is quite similar to that of SIS, however, 
particles are moved to the best possible positions after weighting unlike the SIS filter in 
which particles are fixed in their positions. Transformation of particles after weighting 
is what prevents filter divergence even in high-dimensional cases with small number 
of particles. 
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From another perspective, EnGSF may be looked as a collection of EnKFs acting 
together and are weighted according to their importance. The importance is mea- 
sured by how close a specific particle (i.e., the mean of a Gaussian kernel) is to the 
measurement with respect to a suitable distance measure. 

It is a known fact that resampling may reduce the diversity of particles and hence 
modeling uncertainty is an essential tool in such settings to introduce more diversity. 
Note that in the EnGSF, each analysis particle carries a Gaussian kernel which may 
be used to introduce more diversity. In particular, when only one particle carry all 
the probability mass (i.e., a" = 1.0 for x"), we can perform a Gaussian re-sampling 
step by the following transformation: 

A^ xiV = ■ ■ ■ ,*S] mxN + {A' f + K(Y' - HA'f)} mxN (21) 

where 

K = Z f H T (HS / H T + R) ~ X , (22) 

= A' f A' fT (23) 

and columns of the so-called measurement and state perturbation matrices are given 
by: 

A« {iJ) =(a j N*&y (xj-x7) (24) 



V 

Y OJ) 



y + r„ rj ~iV(0,R) 



(25) 



The above equations are the same introduced in [S] . It can be shown that samples 
obtained using eq. (21) will asymptotically (i.e., in the limit of very large ensemble) 



have the desired mean and covariance given by eqs. (10) and (11) respectively. Thus 



Gaussian resampling involves adding adequate perturbations to the analysis particles 
to achieve the desired spread. Though not explored in this work, it may be beneficial 
to use the resampling ideas from the recently advised Merging Particle Filter 

2.5. The complete algorithm for EnGSF 

The complete algorithm for the EnGSF is summarized below: 



Algorithm 2: EnGSF 
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[{x* o*}^] = EnGSF[{xf' _1 , a k ~ 1 }fL 1 ,y k ] 

• FOR i = 1 : N 

- Integrate according to forward model Q 

[{xf,af}f =1 ] = fwd_model[{xr\ar}f =1 ] 

• END FOR 

• FOR i = 1 : N 

- Calculate weights according to eq. ^ 

- Calculate analysis particles according to eq. (10) 

• END FOR 

• Resample using algorithm 1 

- [{xj,a?}£i] = ResampleKx^af}^] 



3. Numerical Illustration 

In this section, we provide four numerical examples to test the performance of our 
method in forecasting state of nonlinear dynamics. The first two examples are one 
dimensional and have previously appeared in [23J. The third and forth examples are 
the celebrated three and forty dimensional Lorenz models EI] which have been 
studied extensively in the data assimilation literature. 

3.1. Example 1: One- dimensional Bay esian update 

In this example, we study one step of a non-Gaussian Bayesian update. We consider 
prior and likelihood pdf 's of the following forms 

/ \ 1 T 1 r x-1.5 \2 1 /■ g+l.S sal 

p(x) = - e z ( o.i > +e 2^ o.i > (26) 
rj L J 

p(d\x) = -. 1 e~^ ( ^r )2 (27) 
V ' ; v/2tt(0.1) 

Here 77 is a normalization constant so that p(x) dx = 1. Such set up may 
happen in nonlinear dynamics when the system is in transition between stable points 
as will be demonstrated in the next example. We draw i.i.d samples from the prior 
and estimate the posterior pdf using different methods for various number of particles. 
The true pdf's are plotted in Fig. (JIJ. 

Fig. ([2]) shows the posterior pdf approximated using three different data assim- 
ilation methods, namely EnKF, ensemble square root filter (EnSRF) and EnGSF. 
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Exact Bayesian 




Figure 1: Prior, likelihood and the true posterior pdf obtained by direct multiplication of prior and 
likelihood at 10000 uniform grid points in [—4, 4] 



N = 200 



N = 500 



N = 1000 




-2 -1 



Figure 2: Posterior pdf obtained from different filtering techniques for various number of particles 
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Clearly, increasing the number of particles has little effect on the posterior pdf ob- 
tained using EnKF and EnSRF while that of EnGSF converges to the true posterior. 
This may be attributed to the fact that EnKF and its variants are optimal w.r.t only 
the first two statistical moments. Another interesting observation one can make from 
these plots is the difference between EnKF and EnSRF while both methods are from 
the same family. It is clear that EnSRF is capable of detecting some bimodal behavior 
whereas EnKF tend to smooth out bimodality. This discrepancy may be attributed 
to the fact that in EnKF, observations are treated as random variables and perturbed 
in order to maintain a reasonable variance. This results show another bad side effect 
of perturbing observations in addition to those pointed out in [33] . 

The movement of particles are also plotted in the same figure. The bottom and 
top particles are those of prior and posterior respectively. The size of particles is 
proportional to their weights. It is clear that the transformation of particles is of 
significant importance in capturing the accurate posterior pdf. Owing to the fact that 
in EnKF particles are assumed equally weighted, ideally one needs a transformation 
that takes the i.i.d samples of prior and outputs i.i.d samples of posterior if one wants 
to achieve the correct pdf. Clearly the transformation performed in the EnKF does 
not do so for a general pdf. 

A measure of the difference between two pdf may be obtained using Kullback- 
Leibler (KL) divergence, we use this measure in order to study the convergence 
behavior of different data assimilation methods in this paper. The KL divergence for 
continuous probability density functions p and q is defined as: 

D K L(p\\q)= \og P -^\ p{x) dx (28) 

Here, p is chosen to be the true pdf while q is the approximate pdf obtained using 



one of the data assimilation techniques. A discrete approximation of eq. (28) may be 
obtained by: 

D K L{p\\q) = S>g^ P(*<) Ax * ( 29 ) 

Fig. ([3]) shows the KL divergence curve for the three methods discussed in this 
paper. Clearly, the convergence of EnKF and EnSRF is insensitive to the number 
of particles, N, used in the process of data assimilation. EnGSF, however, shows a 
monotonic convergence as N increases. 

3.2. Example 2: One- dimensional nonlinear stochastic differential equation 

In this example, we apply the same methods discussed in the previous example and 
the Sequential Importance Resampling (SIR) filter to predict the state of a nonlinear 
stochastic differential equation (SDE) using noisy and sparse observational data. Note 
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Figure 3: The convergence plot of different methods according to KL divergence measure 

that SIR filter is the same as SIS filter in which one performs resampling after each 
data assimilation step. We consider the following ltd stochastic differential equation 



Here, £(t) is a Gaussian white noise representing the modeling uncertainty and 
k determines the magnitude of the noise. The deterministic part of the above SDE 
f'{u) = 4u(t)—4:u(t) 3 corresponds to the potential function of the form f{u) = 2m 2 — u 4 
for which there are three equilibrium points, two stable at u = ±1 and one unstable 
at u — 0. 

For k greater than a certain threshold, the state of the above SDE can change 
from one stable fixed point to the other. Keeping the amount of noise small, however, 
makes such transitions rare. The pdf of state while system is in transition between 
stable fixed points is usually non-Gaussian. Therefore, a possible test to assess the 
performance of a nonlinear filter is to observe its behavior in tracking such transition 



Fig. (|4j) shows an exemplar of the true state and observations generated using 
k = 0.7, u(0) = 0.8 and variance of the noise R = 0.1. 

Fig. ^ shows the estimated mean for different schemes with N = 100. As is clear, 
EnKF and EnSRF are slow in making the transition as also reported in [23]. Also, 
the difference between the two is minimal as expected in case with larger number 
of particles. It must be noted that EnSRF may result in a slightly better estimates 
due to using a modified gain matrix in the update step which eliminates the need of 
perturbing observations. However, in the experience of the authors its limitation for 
incorporating measurements serially limits the use of vectorized computer architecture 
and may result in excessive overhead when the number of data is large. Moreover, the 



(SDE): 



du{t) 
dt 



4u(t) - 4u(t) 3 + k£ (*) 



(30) 



[23J 



14 




transformation of the gain matrix may not be accurate when covariance localization 
is applied ans some care must be taken as pointed out in |33j. 

It is a known fact that SIR filter does not track the transition effectively when 
the number of particles is small which is consistent with the results obtained here. 
EnGSF, however, tracks the transition more accurately. This is attributed to the 
fact that the posterior pdf approximated by EnGSF tend to be more accurate than 
those of other schemes in areas of non-Gaussian behavior. As a result of this, the 
time averaged root mean squared error (RMSE) of the estimate obtained by EnGSF 
is almost half the one of EnKF (0.33 vs. 0.64). Though we have reported the mean 
here, it must be noted that the mean may not be an appropriate measure to compare 
different algorithms in case of non-Gaussian/nonlinear problems as also pointed out 

Fig. ^ shows the posterior pdf at the time of transition, (i.e., t = 4). The optimal 
posterior density function is obtained using SIR filter with 10,000 particles. While 
EnKF, EnSRF and SIR produce a biased and unimodal estimate of the posterior, 
EnGSF results in a more accurate estimate of the posterior pdf which captures the 
non-Gaussian features effectively. 

3.3. Example 3: Three dimensional Lorenz63 model 

The Lorenz63 model j2H] is a set of three coupled nonlinear ordinary differential 
equations defined by: 

dx(t) dy(t) dz(t) 

—— = 7 ( y _ x), —jj- = px-y-xz, —jj- = xy- /3z (31) 

where x(t), y(t), z(t) are time dependent unknown variables. The commonly chosen 
parameters are 7 = 10, p = 28 and (3 = 8/3. To integrate the above equations, we 
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Figure 5: Mean estimate obtained from different data assimilation techniques with N = 100 particles: 
(a) EnKF (b) EnSRF (c) SIR and (d) EnGSF 



use a forth order Runge-Kutta scheme with a time step of At = 0.01. The Lorenz63 
model has been the subject of many data assimilation studies due to its nonlinear 
and chaotic nature (c.f., [T4J [25j E3j). Here, we study the behavior of EnGSF and 
EnKF in tracking the state of the lorenz63 model. 

A reference solution is generated by integrating the model in the time interval 
t G [0,100] with the initial condition (x ;y ;z ) = (1.508870; -1.531271; 25.46091) 
and a diagonal model error covariance with variances equal to 2.0, 12.13 and 12.31 
corresponding to the error growth over one time unit for x, y and z respectively. These 
are the same values used in reference [14J. The stochastic term is applied as dq = 
ay/A~i where a 2 is the variance and d£ is drawn from unit normal. Observations 
are collected for all variables by adding Gaussian noise with standard deviation of 
2.5 to the reference solution and the distance between consecutive observations are 
assumed At b s = 0.5. At this level of sparsity and noise, it is expected that non- 
Gaussian features become important. For the data assimilation, we used the same 
modeling error covariance by which the reference solution is obtained. The initial 
realizations are generated from a multivariate normal distribution having mean equal 
to (xo; yo; z Q ) and covariance equal to 4I 3 . The number of particles is set to iV = 200 

Fig. ( 3.3[ ) show the reference solution for x and the absolute value of error between 
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Figure 6: Posterior pdf at t = 4 obtained from different data assimilation techniques with N = 100 
particles: (a) EnKF (b) EnSRF (c) SIR and (d) EnGSF. The optimal posterior pdf is obtained using 
SIR filter with N = 10, 000 particles 



the reference solution and the estimates obtained using EnKF and EnGSF. Similar 
plot is shown in Fig. ( |3.3 ) for the y variable. Clearly, large errors take place in the 



transition areas (such as t = 25) as also pointed out in [14J. However, we observe 
that EnGSF results in lower errors compraed to that of EnKF in such areas. For 
this experiment, we observed a time-averaged RMSE of 3.42 for EnGSF and 3.74 for 
EnKF. It is notable that this number for SIR filter with 2000 particles was 3.39. 

3.4- Example 4- Forty dimensional Lorenz95 model 

Lorenz95 model [5T] is a system of coupled nonlinear ordinary differential equations 
defined by: 

—j± = (xj+i - Xj- 2 )xj-i -Xj + F, j = 1 • • • m (32) 

with periodic boundary conditions defined by sc_i = x m _i , x = x m and X\ = x m+ i. 
This model resembles dynamics of the atmosphere and has been shown to behave 
chaotically for F greater than 4.0 and m = 40 [22]. Though the dimension of state 
space for this model is still far from those of the real world, It is considered as a 
challenging problem from the perspective of data assimilation because of its highly 
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Figure 7: (top) The reference solution for x. (bottom) The absolute value of the difference between 
the reference solution and the estimates obtained by EnKF (dotted line) and EnGSF (solid line) 
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Figure 8: (top) The reference solution for y. (bottom) The absolute value of the difference between 
the reference solution and the estimates obtained by EnKF (dotted line) and EnGSF (solid line) 
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chaotic nature [32J. 

Here we set m = 40, F = 8.0 and the initial condition is chosen to be (xj = 
F, j 7^ 20) and 220 = 1.001F. A forth order Runge-Kutta scheme with a time step 
of At = 0.01 is used to integrate the model. It is shown in [22] that a time step 
of 0.05 unit in the model corresponds to 6-h in real life. Observations are collected 
for all variables every 5 time step (6-h) for a period of 5000 time step after a spinup 
period of 2000 time steps. Note that this choice of observational time corresponds 
to a regime where the effect of non-Gaussian statistics are significant [32} [25] . The 
observational noise is considered Gaussian according to iV(0, 2I40). The initial prior 
pdf is considered Gaussian with uncorrelated variables each having mean and variance 
equal to 2.0. The model noise is applied as dq = ervAt dt; where the variance a 2 is 
set to 25.0 and d£ is drawn from unit normal. 



1.6 




' 200 400 600 800 1000 

N 

Figure 9: Time averaged RMSE vs. number of particles for Lorenz95 model using EnGSF and EnKF 

Fig. ([9]) shows the time-averaged RMSE obtained for Lorenz95 model using En- 
GSF and EnKF with variable number of particles. Clearly, the EnKF estimate does 
not improve as the number of particles increases, a phenomenon observed in previous 
examples too. EnGSF, on the other hand shows a monotonic convergence with the 
number of particles and produces better estimates compared to EnKF in this setting. 

4. Conclusion 

We have presented a new methodology, the EnGSF, for nonlinear tracking and 
estimation problem. The EnGSF, as shown through several numerical examples, 
has the ability to represent non-Gaussian statistics more accurately than currently 
operational data assimilation schemes such as EnKF. This may be beneficial for 
dynamics with highly non-linear behavior. 
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Moreover, as computers become more powerful, one would expect that the number 
of forward runs one can perform in a certain time interval increases and so data 
assimilation can be carried out with more particles. This brings about the need 
for data assimilation schemes that could effectively utilize higher order statistical 
moments than only the first two. The convergence results of the EnGSF presented in 
this paper show that the method scales well with the number of particles as opposed 
to EnKF and its variants that are optimal with respect to only the first two moments. 

Another advantage of the EnGSF is its affordable computational cost compared to 
other nonlinear estimation techniques. For instance, in compound methods that use 
the posterior pdf obtained by EnKF as the proposal density for SIS filter (cf. [23J 132]), 
the computational cost while calculating the weights may become cumbersome for 
large-scale problems, though these methods are certainly attractive in that a more 
accurate representation of pdf is obtained as a result of two stage filtering. Note that 
the computational cost of EnGSF is proportional to that of EnKF. 

The EnGSF has clear connections to SIS filter and hence one expects that new 
resampling ideas such as those introduced in [25] enhance the performance of the 
method though not explored in this paper. On the other hand, the methodology 
can be looked as a weighted collection of EnKF's acting together. Therefore, all the 
implementation aspects of the EnKF, such as localization, are also applicable for the 
EnGSF. 
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